More generic definition than just a table or dataframe
2. Algebra
Produce combinations of variables
Join, concatenate, group by
3. Scales
Scale variables
Transformation like taking log or normalizing
4. Statistics
Compute statistical summeries
Generates a new varset
5. Geometry
Control the type of plot
point, line, area, path, bar, polygon, edge etc.
6. Coordinates
The coordinate system and faceting
Usually Cartesian, but also polar or geographic coordinates
7. Aesthetics
Actual mapping of variables to a perceivable graphic
Visual variables include position, size, shape, orientation, brightness, color, granularity. For interactive graphics also blur, sound and motion.
Vega-Lite’s Grammar of Interactions
Selection component
Description
Comments and examples
type
Way in which backing points are selected as minimal set to identify all selected points
point, list, interval
predicate
Logic to determine selected points
Inside or outside dragged area, within a range etc.
domain or range
Invert screen position to data values
Click on a mark for selecting single point, drag to select points in area etc.
event
The actual input event
Mouseover, selection by dragging
init
Initialize selection with specific points
Used for automatically determining scale extents
transforms
Manipulate selection
E.g. moving a rectangular selection
resolve
Re-evaluate visual encodings as selections change
Change color (highlighting), use selection as input for other encodings (cross-filtering), re-define scales etc.
Choosing your Python libraries for interactive visualizations and dashboards
Pythonistas are somewhat envious of that the fact that the R stack has set the standard for interactive data visualizations and dashboarding with ggplot2 and Shiny. But recently Python has caught up, going by the number of stars on GitHub.
For you interactive data visualization work, you need to make two choices:
Choose your interactive plotting library for making figures. Altair, Bokeh and Plotly are the most popular ones, you can find a more detailed comparison here
Choose your dashboarding library for making, you guessed it, dashboards. Streamlit, Voilá, Plotly Dash and Panel are the most popular ones, you can read more here and here
Note that although many plotting libraries are supported in the dashboarding libraries, some integrations work better than others. Sticking to the same ecosystem yields the following combinations:
Plotly + Plotly Dash: backed by a Canadian company under the same name, this is an excellent stack to work in. Over time you can upgrade to a paid (enterprise) version including low-code development environments and hosting for ease of sharing apps.
Bokeh + Panel: pure open source libraries which are financially supported by the NumFOCUS and Anaconda. Complete freedom to integrate these libraries into your own stack without ever having to worry about licensing.
Altair + Streamlit: the new kids on the block, but with an impressive pedigree. University of Washinton Interactive Data Lab are the core developers of Altair, with Jeffrey Heer, Jake VanderPlas and Mike Bohstock amongst their ranks. Note that Tableau is a spin-off from this community, too. Streamlit is incorporated in the US, but it’s creators are spread all over the world. It was acquired by Snowflake in 2022.
from dataclasses import dataclassimport altair as altfrom bokeh.models import (Button, CategoricalColorMapper, ColumnDataSource, HoverTool, Label, LogTicker, Slider)from bokeh.palettes import Spectral6from bokeh.plotting import figureimport matplotlib.pyplot as pltimport numpy as npimport pandas as pdimport plotly.graph_objects as goimport streamlit as st@dataclassclass Gapminder:"""Class for storing Gapminder data and plots""" url: str="https://raw.githubusercontent.com/plotly/datasets/master/gapminderDataFiveYear.csv" year: int=1952 show_data: bool=False show_legend: bool=True chart_height: int=500def __post_init__(self):self.dataset = pd.read_csv(self.url)self.df =self.get_data()self.title =f"Life expectancy vs. GPD ({self.year}"self.xlabel ="GDP per capita (2000 dollars)"self.ylabel ="Life expectancy (years)"self.xlim = (self.df['gdpPercap'].min()-100,self.df['gdpPercap'].max()+1000)self.ylim = (20, 90)def get_data(self):"""Return gapminder data for a given year. Countries with gdpPercap lower than 10,000 are discarded. """ df =self.dataset[ (self.dataset.year ==self.year) & (self.dataset.gdpPercap <10000) ].copy() df["size"] = np.sqrt(df["pop"] *2.666051223553066e-05)return dfdef altair(self): legend = {} ifself.show_legend else {"legend": None} plot = ( alt.Chart(self.df) .mark_circle() . ( alt.X("gdpPercap:Q", scale=alt.Scale(type="log"), axis=alt.Axis(title=self.xlabel), ), alt.Y("lifeExp:Q", scale=alt.Scale(zero=False, domain=self.ylim), axis=alt.Axis(title=self.ylabel), ), size=alt.Size("pop:Q", scale=alt.Scale(type="log"), legend=None), color=alt.Color("continent", scale=alt.Scale(scheme="category10"), **legend ), tooltip=["continent", "country", "gdpPercap", "lifeExp"], ) .properties(title="Altair", height=self.chart_height) .configure_title(anchor="start") )return plot.interactive()def plotly(self): traces = []for continent, self.df inself.df.groupby("continent"): marker =dict( symbol="circle", sizemode="area", sizeref=0.1, size=self.df["size"], line=dict(width=2), ) traces.append( go.Scatter( x=self.df.gdpPercap, y=self.df.lifeExp, mode="markers", marker=marker, name=continent, text=self.df.country, ) ) axis_opts =dict( gridcolor="rgb(255, 255, 255)", zerolinewidth=1, ticklen=5, gridwidth=2 ) layout = go.Layout( title="Plotly", showlegend=self.show_legend, height=self.chart_height, xaxis=dict(title=self.xlabel, type="log", **axis_opts), yaxis=dict(title=self.ylabel, **axis_opts), )return go.Figure(data=traces, layout=layout)def bokeh(self):# note bokeh version issue https://discuss.streamlit.io/t/bokeh-2-0-potentially-broken-in-streamlit/2025/8 source = ColumnDataSource(self.df) color_mapper = CategoricalColorMapper(palette=Spectral6, factors=self.df.continent.unique()) plot = figure(title="Bokeh", x_axis_type="log", height=self.chart_height) plot.xaxis.axis_label =self.xlabel plot.xaxis.ticker=LogTicker() plot.yaxis.axis_label =self.ylabel plot.scatter( x="gdpPercap", y="lifeExp", size="size", source=source, fill_color={"field": "continent", "transform": color_mapper}, fill_alpha=0.8, line_color="#7c7e71", line_width=0.5, line_alpha=0.5, legend_group="continent" ) plot.add_tools(HoverTool(tooltips=[ ("continent:", "@continent"), ("country:", "@country"), ("GDP per capita:", "@gdpPercap"), ("Life expectancy:", "@lifeExp")], show_arrow=False, point_policy="follow_mouse"))return plotdef pyplot(self): data =self.df title ="Matplotlib" fig, ax = plt.subplots(figsize=(3, 3)) ax.set_xscale("log") ax.set_title(title, fontsize=16) ax.set_xlabel(self.xlabel, fontsize=10) ax.set_ylabel(self.ylabel, fontsize=10) ax.set_ylim(self.ylim) ax.set_xlim(self.xlim)for continent, df in data.groupby('continent'): ax.scatter(df.gdpPercap, y=df.lifeExp, s=df['size']*5, edgecolor='black', label=continent)ifself.show_legend: ax.legend(loc=4)return fig# initiategapminder = Gapminder()st.set_page_config(layout="wide")# side barst.sidebar.subheader("Widgets")st.sidebar.markdown("Use the slider to show data from subsequent years.")gapminder.year = st.sidebar.slider(label="", min_value=1952, max_value=2007, step=5)gapminder.show_legend = st.sidebar.checkbox("Toggle legend", gapminder.show_legend)gapminder.df = gapminder.get_data()# main bodyst.title("Gapminder in different ways")st.markdown("""Demo of different interactive plotting libraries reproducing the classic [Gapminder bubble chart](https://discuss.streamlit.io/t/bokeh-2-0-potentially-broken-in-streamlit/2025/8). """)with st.expander("Show data"): st.dataframe(gapminder.df)col1, col2 = st.columns([1, 1])with col1: st.altair_chart(gapminder.altair(), True) st.plotly_chart(gapminder.plotly(), True)with col2: st.bokeh_chart(gapminder.bokeh(), use_container_width=True) st.pyplot(gapminder.pyplot(), False)
2022-09-16 13:23:30.529
Warning: to view this Streamlit app on a browser, run it with the following
command:
streamlit run /Users/dkapitan/.local/lib/python3.10/site-packages/ipykernel_launcher.py [ARGUMENTS]
2022-09-16 13:23:30.530 Session state does not function when running a script without `streamlit run`
Altair
Structure of a Altair plot
alt
convention to import altair as alt
.Chart(data)
instantiate Chart object with data
.transform_{aggregate|bin|calculate|…}
apply transformations before visualization
.mark_{area|bar|circle||…}
choose the geometry c.q. type of plot
.encode(x=.., y=.., color=..}
mapping of variables to a perceivable graphic
.add_selection(…)
define type and predicates for interactive selections
.transform_filter(…)
apply selection filter
.properties(width=…, height=…)
set properties of figure
.interactive()
enable panning and zooming
Workshop exercises
We are going to use Altair for exploratory data analysis (EDA), with a dataset of choice. Try to make the following, starting with a simple graph and building up to more complex interactions
Create a histogram of a feature of interest using alt.Chart().mark_bar()
# more verbose solution, to show how you can parametrize composition in Altairbar_ = (alt .Chart() .mark_bar() .encode(x="month(date):T", y="mean(precipitation):Q"))rule_ = (alt .Chart() .mark_rule(color="firebrick") .encode(y="mean(precipitation):Q") )alt.layer(bar_, rule_, data=df).facet(facet="weather:O", columns=2)
Extend your code to build a small multiple with interaction for each multiple. Feel free to submit your solution, so it can be included in this notebook.
Sharing data between callbacks: when your datasets get too large, you need to store it somewhere and keep track of state of your dataset.
Working with callbacks does have limitations. If you notice that you need to nest callbacks (callback A -> callback B -> callback C -> final result), you are on your way to the Callback Hell a.k.a. the Pyramid of Doom. Stop and reconsider before continuing.
My personal recommendations
Choose any interactive plotting library and get to know it: altair, bokeh or plotly
Choose any of the higher level APIs to be productive in you data analysis work: plotly Express or Altair
Choose any of the dashboarding libraries to make an interactive notebook app: Streamlit or Dash
Don’t try to build your own BI tool. Buy one. It saves time and money. (PS: have a look at redash.io)